<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Pattern Replace Char Filter | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-pattern-replace-charfilter.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-pattern-replace-charfilter.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-pattern-replace-charfilter.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-pattern-replace-charfilter.html" rel="nofollow" target="_blank">../en/analysis-pattern-replace-charfilter.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-charfilters.html">Character filters reference</a></span>
»
<span class="breadcrumb-node">Pattern Replace Char Filter</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-mapping-charfilter.html">« Mapping character filter</a>
</span>
<span class="next">
<a href="analysis-normalizers.html">Normalizers »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-pattern-replace-charfilter"></a>Pattern Replace Char Filter<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc">edit</a>
</h2>
</div></div></div>
<p>The <code class="literal">pattern_replace</code> character filter uses a regular expression to match
characters which should be replaced with the specified replacement string.
The replacement string can refer to capture groups in the regular expression.</p>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<h3>Beware of Pathological Regular Expressions</h3>
<p>The pattern replace character filter uses
<a href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html" class="ulink" target="_top">Java Regular Expressions</a>.</p>
<p>A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.</p>
<p>Read more about <a href="http://www.regular-expressions.info/catastrophic.html" class="ulink" target="_top">pathological regular expressions and how to avoid them</a>.</p>
</div>
</div>
<h3>
<a id="_configuration_24"></a>Configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc">edit</a>
</h3>
<p>The <code class="literal">pattern_replace</code> character filter accepts the following parameters:</p>
<div class="informaltable">
<table border="0" cellpadding="4px">
<colgroup>
<col>
<col>
</colgroup>
<tbody valign="top">
<tr>
<td valign="top">
<p>
<code class="literal">pattern</code>
</p>
</td>
<td valign="top">
<p>
A <a href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html" class="ulink" target="_top">Java regular expression</a>. Required.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">replacement</code>
</p>
</td>
<td valign="top">
<p>
The replacement string, which can reference capture groups using the
<code class="literal">$1</code>..<code class="literal">$9</code> syntax, as explained
<a href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendReplacement-java.lang.StringBuffer-java.lang.String-" class="ulink" target="_top">here</a>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">flags</code>
</p>
</td>
<td valign="top">
<p>
Java regular expression <a href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary" class="ulink" target="_top">flags</a>.
Flags should be pipe-separated, eg <code class="literal">"CASE_INSENSITIVE|COMMENTS"</code>.
</p>
</td>
</tr>
</tbody>
</table>
</div>
<h3>
<a id="_example_configuration_15"></a>Example configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc">edit</a>
</h3>
<p>In this example, we configure the <code class="literal">pattern_replace</code> character filter to
replace any embedded dashes in numbers with underscores, i.e <code class="literal">123-456-789</code> →
<code class="literal">123_456_789</code>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1025.console"></div>
<p>The above example produces the following terms:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ My, credit, card, is, 123_456_789 ]</pre>
</div>
<div class="warning admon">
<div class="icon"></div>
<div class="admon_content">
<p>Using a replacement string that changes the length of the original
text will work for search purposes, but will result in incorrect highlighting,
as can be seen in the following example.</p>
</div>
</div>
<p>This example inserts a space whenever it encounters a lower-case letter
followed by an upper-case letter (i.e. <code class="literal">fooBarBaz</code> → <code class="literal">foo Bar Baz</code>), allowing
camelCase words to be queried individually:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(?&lt;=\\p{Lower})(?=\\p{Upper})",
          "replacement": " "
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "The fooBarBaz method"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1026.console"></div>
<p>The above returns the following terms:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ the, foo, bar, baz, method ]</pre>
</div>
<p>Querying for <code class="literal">bar</code> will find the document correctly, but highlighting on the
result will produce incorrect highlights, because our character filter changed
the length of the original text:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index/_doc/1?refresh
{
  "text": "The fooBarBaz method"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "bar"
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1027.console"></div>
<p>The output from the above is:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "timed_out": false,
  "took": $body.took,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "text": "The fooBarBaz method"
        },
        "highlight": {
          "text": [
            "The foo&lt;em&gt;Ba&lt;/em&gt;rBaz method" <a id="CO420-1"></a><i class="conum" data-value="1"></i>
          ]
        }
      }
    ]
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO420-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>Note the incorrect highlight.</p>
</td>
</tr>
</table>
</div>
</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-mapping-charfilter.html">« Mapping character filter</a>
</span>
<span class="next">
<a href="analysis-normalizers.html">Normalizers »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>